SIGMOD RWE Review ”Efficient Parallel Set-Similarity Joins Using MapReduce”

نویسنده

  • Fabian Hueske
چکیده

This document is a review report on the paper ”Efficient Parallel Set-Similarity Joins Using MapReduce” by R. Vernica, M. Carey, C. Li by Sigmod’s 2010 Repeatability and Workability Evaluation Committee. In this section the provided resources (code, data sets, setup information) and hardware setups of the authors and reviewers are discussed. Detailed information on all experiments that the reviewer conducted or tried to conduct for repeatability or workability can be found in sections 2 and 3.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Heads-Join: Efficient Earth Mover's Distance Similarity Joins on Hadoop

The Earth Mover’s Distance (EMD) similarity join has a number of important applications such as near duplicate image retrieval and distributed based pattern analysis. However, the computational cost of EMD is super cubic and consequently the EMD similarity join operation is prohibitive for datasets of even medium size. We propose to employ the Hadoop platform to speed up the operation. Simply p...

متن کامل

MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data

Locality Sensitive Hashing (LSH) has been proposed as an efficient technique for similarity joins for high dimensional data. The efficiency and approximation rate of LSH depend on the number of generated false positive instances and false negative instances. In many domains, reducing the number of false positives is crucial. Furthermore, in some application scenarios, balancing false positives ...

متن کامل

Efficient and Scalable Graph Similarity Joins in MapReduce

Along with the emergence of massive graph-modeled data, it is of great importance to investigate graph similarity joins due to their wide applications for multiple purposes, including data cleaning, and near duplicate detection. This paper considers graph similarity joins with edit distance constraints, which return pairs of graphs such that their edit distances are no larger than a given thres...

متن کامل

Efficient Large Outer Joins over MapReduce

Big Data analytics largely rely on being able to execute large joins efficiently. Though inner join approaches have been extensively evaluated in parallel and distributed systems, there is little published work providing analysis of outer joins, especially on the extremely popular MapReduce platform. In this paper, we studied several current algorithms/techniques used in large outer joins. We f...

متن کامل

Cost Based Multi-Way Equi-Join Optimization in MapReduce

MapReduce is a prominent programming model above shared nothing architecture for processing big data with a parallel, distributed algorithm on a cluster. Join is an important operation is very inefficient in MapReduce. In this work, a time cost based evolution model is proposed for multi-way join by considering the time cost calculation. A multi-way join consists of start pattern joins and chai...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010